reading-notes

About Pandas

Object creation

See the Data Structure Intro section.

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [3]: s = pd.Series([1, 3, 5, np.nan, 6, 8])

In [4]: s
Out[4]:
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Installation

To install pandas from source you need Cython in addition to the normal dependencies above. Cython can be installed from pypi:

python -m pip install -e . –no-build-isolation –no-use-pep517

Time series

pandas has simple, powerful, and efficient functionality for performing resampling operations during frequency conversion (e.g., converting secondly data into 5-minutely data). This is extremely common in, but not limited to, financial applications. See the Time Series section.

In [104]: rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [105]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [106]: ts.resample('5Min').sum()
Out[106]:
2012-01-01    24182
Freq: 5T, dtype: int64
df.describe() Summary statistics for numerical columns
df.mean() Returns the mean of all columns
df.corr() Returns the correlation between columns in a DataFrame
df.count() Returns the number of non-null values in each DataFrame column
df.max() Returns the highest value in each column
df.min() Returns the lowest value in each column

Plotting

In [131]: import matplotlib.pyplot as plt

In [132]: plt.close('all')
In [133]: ts = pd.Series(np.random.randn(1000),
   .....:                index=pd.date_range('1/1/2000', periods=1000))
   .....:

In [134]: ts = ts.cumsum()

In [135]: ts.plot()
Out[135]: <AxesSubplot:>